Thermodynamics of natural images 
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The scale invariance of natural images suggests an analogy to the statistical mechanics of physical 
systems at a critical point. Here we examine the distribution of pixels in small image patches 
and show how to construct the corresponding thermodynamics. We find evidence for criticality in a 
diverging specific heat, which corresponds to large fluctuations in how 'surprising' we find individual 
images, and in the quantitative form of the entropy vs. energy. The energy landscape derived 
from our thermodynamic framework identifies special image configurations that have intrinsic error 
correcting properties, and neurons which could detect these features have a strong resemblance to 
the cells found in primary visual cortex. 



I. INTRODUCTION 

From the familiar faces of our friends and family to 
objects of almost every size in environments of every 
type, the world that we see is full of structure. Although 
this structure seems obvious when we look at the world, 
providing a precise mathematical description has proven 
more difficult. One way to formulate this problem is to 
ask for a probability distribution of images such that, if 
we draw at random out of this distribution, the resulting 
images resemble those that we sec in the natural envi- 
ronment. Such a probabilistic or generative model would 
provide a rigorous basis for practical algorithms in image 
coding, processing and recognition pQ. It is also reason- 
able to hypothesize that our brains have learned at least 
an approximation to this probabilistic model, allowing 
us to form more efficient representations of the visual 
world and to find efficient solutions of many seemingly 
difficult computational problems. In this view, aspects 
of vision ranging from the responses of individual neu- 
rons to gestalt perceptual rules would be seen not as ar- 
tifacts of the brain's circuitry but rather as matched to 
the statistical structure of the physical world |3J H| |S] . 

One statistical feature of natural images that pro- 
vides a clue about the nature of the underlying prob- 
ability distribution is scale invariance. In particular, 
Field observed that the spatial patterns of image inten- 
sity from reasonably natural environments have power 
spectra that approximate Si cx 1/fc 2 , which is what one 
would expect from the hypothesis of scale invariance and 
simple dimensional analysis, and he suggested that this 
scaling behavior may have a direct connection to the dis- 
tribution of receptive field parameters across neurons in 
visual cortex [6] . The intuition that scale invariance is a 
strong constraint on the form of the probability distribu- 
tion comes from statistical mechanics. We recall that for 
systems in thermal equilibrium, the probability that we 
observe the system in state s is given by the Boltzmann 
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distribution, p s cx exp(— E s /T), where E s is the energy 
of the state and T is the absolute temperature [7]. For 
most physical systems at generic values of the tempera- 
ture and other parameters, correlations and power spec- 
tra are not scale invariant; rather there is some character- 
istic length £ that determines the distance beyond which 
structures approach statistical independence. Scale in- 
variance emerges only when we tune the temperature to 
a special value T c , the critical point which marks a sec- 
ond order phase transition between two different phases 
(liquid and gas, ferromagnet and paramagnet, ... ) [8]. 
The modern theory of critical phenomena teaches us that 
such scale invariance can occur while violating the naive 
expectations of dimensional analysis, so that power spec- 
tra can acquire "anomalous dimensions," S cx l/fc 2 ~''. 
Further, scaling extends beyond low order statistics, so 
that the full probability distributions are predicted to be 
invariant (but non-Gaussian) under appropriate scaling 
transformations. Both anomalous scaling and invariant 
non-Gaussian distributions for local features have been 
observed in an ensemble of natural scenes [HI HO] . 

The analogy between scaling in natural images and 
the behavior of physical systems at their critical point 
point raises the question of whether there are analogs to 
the thermodynamic features of a critical point. Can we, 
for example, generalize a given natural image ensemble 
to a family of ensembles indexed by a "temperature," 
and show that there is something special (i.e., critical) 
about the temperature of the real ensemble? If there 
is an analog of the diverging specific heat at T c , what 
does this say about the nature of images? What are the 
order parameters that characterize the underlying phase 
transition? Here we report some preliminary results on 
these and related questions. 



II. THE IMAGE ENSEMBLE AND SCALING 

As an initial data set we returned to the image en- 
semble of Ref [9 . We focus here on the 45 images taken 
at lower spatial resolution, corresponding to 256 x 256 
pixel regions covering ~ 15° x 15° scenes in the woods 
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FIG. 1: Ensembles of natural images, and their quantized versions, (a) An example image from the ensemble [5]. (b) The 
image from (a), after quantizing into two equally-poplulated levels. Even when most intensity information is discarded the 
image retains substantial structure, (c) Normalized power-spectra for gray-level and black and white images. In both cases 
S(k) ~ k~ a . The small image size precludes a more accurate determination of the scaling exponent, (d) The full distribution 
of black and white pixels in 3 x 3 patches is invariant to block scaling. The scaling is imposed either by block averaging the 
original light intensity and then quantizing (top) or by using the majority rule on quantized pixels (bottom). 



of Hacklebarney State Park in New Jersey; an example 
is shown in Fig la. Our path to the construction of 
a thermodynamics involves sampling the distribution of 
images in small patches. To make this problem man- 
ageable, we quantize the grey scale images into just two 
levels, with the quantization threshold chosen so that the 
numbers of black and white pixels are exactly equal over 
the ensemble. 

It is important to verify that the rather harshly quan- 
tized images preserve interesting structures of the orig- 
inal scenes. First, by inspection of Fig lb we see that 
objects and even parts of objects (branches and leaves 
on the trees) are recognizable. More quantitatively, if 



the original image is 4>{x), then we have constructed a 
discrete image a(x) — sga[(/)(x) — 6], with 8 chosen so 
that (a) = 0, where (• • •) represents an average over the 
image ensemble. Power spectra are defined by 

(cb(xMx>)) = J -0^$ e *(*-*'>, (1) 

and similarly for S a . In Fig lc we show the spectra 
and S a , averaged over the orientation of the 'momen- 
tum' vector k and normalized by the total variance. We 
see that both spectra exhibit scaling, with very simi- 
lar exponents. As discussed in Ref [9 a , there is excess 
power at high frequencies because of aliasing, and more 
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compelling evidence for scaling is obtained by combining 
these data with an ensemble of images from the same en- 
vironment at higher angular resolution, so that the full 
range of \k\ spans 2.5 decades. 

In the quantized images, an L x L pixel region can 

T 2 

take on 2 possible states, and our data set provides 
~ 3 x 10 6 samples of these states, although these are 
not independent. Thus we can expect to provide a good 
sampling of the distribution of discretized image patches 
for L = 3 or even L = 4, since 2 16 < 3x 10 6 , but L = 5 is 
out of reach with this data set. A direct estimate of the 
entropy shows that 5(4 x 4) = 11.154 ± 0.002 bits, much 
less than the 16 bits we would obtained from random 
pixels; similarly, we find 5(3x3) = 6.580±0.003 < 9 bits. 
This quantifies our impression that a substantial amount 
of local structure is preserved in the discretized images. 

We can also test for scaling more generally by asking 
how distribution of image states in L x L patches evolves 
when we coarse-grain the images, and we can do this in 
two ways. First, we can take the original grey scale im- 
ages 4>{x) and create a new image such that the value of 
4> in each pixel of the new image is the average over a 
2" x 2" block of pixels in the original image, and then 
we can quantize these images. When we look at 3 x 3 
patches in these filtered and quantized images, we again 
have 2 9 possible states, and we call the distribution over 
these states P n , where Pq is the distribution obtained 
from the original images without any blocking. In the 
same spirit (and following the original approach in Ref 
we can take the quantized image cr(af) and directly 
create new quantized images by applying majority rule 
to the pixels in 3 x 3 blocks, and this can be iterated; 
we'll call the resulting distributions of states in images 
patches P n , where again Pq = Pq is what we obtain 
without blocking. Scale invariance is the claim that all 
the P n and P n will be the same, independent of n, be- 
cause the distribution of states is at a fixed point of this 
"renormalization" transformation [8 . In Fig Id we test 
this prediction, showing that it is obeyed with good ac- 
curacy over four decades in probability. There are more 
significant deviations in the first step of coarse-graining 
(n = 1, at left in the figure), presumably because of the 
effects of aliasing noted above. We emphasize that this 
test of scale invariance involves the full, joint distribution 
of image intensities in 3 x 3 patches, and thus goes be- 
yond checking the power law behavior of the spectrum (a 
second order moment) or the invariance of distributions 
of features (e.g., the outputs of local filters) evaluated 
at a single point. Similar results arc obtained for 4x4 
patches. 



III. TEMPERATURE AND SPECIFIC HEAT 

Small patches of our discrete images are described by 
a set a of binary variables. Let us imagine that the 
distribution of these image patches is really the Boltz- 



mann distribution for some physical system at tempera- 
ture T = 1, with some "energy" function E{a) describing 
each possible patch, 

P(*) = | e - B <*>. (2) 

Then, following the methods used in the analysis of dy- 
namical systems |T2j [13] , we can define the distribution 
at any temperature T, since 

*■<*) , ^ f -^ = J_ ){p(sr r ,„ 
Z{T) = Y i \P{S)f' T . (4) 

a 

We can define an entropy at each value of the tempera- 
ture, S(T) — — ^2sPt(^) log Pt{v), and then the usual 
thermodynamic relations tell us that the heat capacity 
is C(T) = TdS(T) /dT. It is also useful to note that the 
heat capacity is proportional to the variance in energy 
or log probability, C(T) = ([5E(a)] 2 ) T /T 2 , where (• • -) T 
denotes an average in the distribution Pt(<j). 

In a system with a critical point, the specific heat 
should diverge at T = T c . Of course this is true only in 
the thermodynamic limit of large systems, correspond- 
ing here to image patches containing many pixels. Can 
we see precursors of this divergence in the small patches 
that we can actually sample? Figure 2a shows the spe- 
cific heat for 2 x 2, 3x3 and 4x4 patches in our image 
ensemble, calculated directly from our sampling of the 
distributions P{S). We see that, even when we normal- 
ize by the number of pixels N = L 2 m each patch (since 
we expect that the heat capacity is extensive), looking 
at larger patches reveals a larger specific heat with a 
clear peak as a function of temperature, and this peak 
is shifting toward T = 1 . 

To calibrate our intuition about the specific heat es- 
timated from small patches, we have done precisely 
analogous computations on the nearest neighbor fer- 
romagnetic Ising model in two dimensions, defined by 
E{a) — —JJ2(ij) cr i (J }i where Ylfij) denotes a sum over 
neighboring pairs of pixels. Monte Carlo simulations 
of this model generate binary "images," and so many 
of the practical sampling questions are very similar to 
those in our problem. We sec in Fig 2b that the spe- 
cific heat again shows a peak which grows and moves 
toward the true critical temperature T c = 1 as we look 
at larger patches. Quantitatively the behavior is actually 
less dramatic than in the images, perhaps because the di- 
vergence of the specific heat in the thermodynamic limit 
is very gentle (logarithmic). Although this Ising spin 
system certainly is much simpler than the ensemble of 
images, comparison of Fig 2a and 2b supports the idea 
that we what we see in the images is consistent with an 
underlying divergence of the specific heat at a critical 
temperature close the to real temperature T = 1 . 




FIG. 2: Diverging specific heats of natural images, (a) The specific heat C/N = (T/N)dS(T)/dT constructed from natural 
images for L x L patches of linear dimension L = 2,3,4 pixels. Away from the natural operating temperature (T — 1) the 
distribution is defined by Eq. |3j|. The peak in the specific heat near T = 1 suggests that natural images are drawn from a 
critical ensemble, (b) As in (a) but constructed from Monte Carlo simulations of an Ising model with T c — 1; also shown is 
the exact behavior in the thermodynamic limit [14] • Even in small patches we see hints of the underlying critical behavior. 



IV. ENTROPY VS. ENERGY AND ZIPF'S LAW 

A complementary perspective on thermodynamics is 
the microcanonical ensemble, corresponding to fixed en- 
ergy rather than fixed temperature. The end result of 
the discussion will be an attempt to measure, for our 
image ensemble, the entropy as a function of energy. We 
begin with some standard results on how thermodynamic 
quantities are encoded in the plot of entropy vs. energy 



and on how we can identify a critical point in this plot. 

All thermodynamic quantities can be recovered from 
the partition function, 

Z(T) = Y j e- E(3)/T - (5) 

(7 

We can rewrite this sum by grouping together all states 
that have the same energy, 



Z{T) = Y j e- E ^' T = J dE 



e~ E/T = J dEp{E)e 



E/T 



(6) 



which defines the density of states p(E) . For a large sys- 
tem (patches with many pixels), the density of states be- 
comes a smooth function, and we can define an entropy 
S(E) at fixed energy as the log of the number of states in 
a narrow range of energies, so that p(E) = (l/A)e s ^ E \ 
Then the partition function is 

Z(T) = 1/ ' dEeS{E) ~ E/T - ( ? ) 

Further, both the energy and entropy are extensive vari- 
ables which should be proportional to the size of the sys- 
tem, N; here N will be the number of pixels in a patch. 
Then we define e = E/N and s(e) = S(E = Ne)/N, and 



the partition function becomes 

Z{T) = 1 [ deeNl " (e) ~ e/T] - ( 8 ) 

Now it is clear that, as N becomes large, the integral will 
be dominated by the point where exponent is maximal, 
that is an energy such that ds(e)/de = dS(E)/dE = 1/T. 
This connects the (microcanonical) description at fixed 
energy with the (canonical) description at fixed temper- 
ature. In addition, one can show that the specific heat 
is (inversely) related to the second derivative of the en- 
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(a) (b) 

FIG. 3: Indication of critical behaviour in the plot of entropy vs. energy, (a) The results for rectangular image patches 
of size 8 to 50 pixels, (b) Results for Monte-Carlo simulations of the 2-D Ising model. The smooth red curve in the exact 
thermodynamic limit [14] . 



tropy, 



C = 



N 

J^2 



d 2 s(e) 
de 2 



(9) 



In this language, the divergence of the specific heat oc- 
curs where the second derivative of the entropy vs. en- 
ergy vanishes. This is the hallmark of a second order 
phase transition. 

It is important to note that we can define E for every 
state that we observe simply as the log of the probability, 
from Eq pj, E{S) = — lnP(a) + c, where c defines the 
(arbitrary) zero of energy, which we choose so that the 
most probable state has zero energy. While it is tricky to 
measure the density of states or distribution of energies, 
it is easy to define the cumulative distribution, 

,E 

Af(E) = / dE'p(E'), (10) 
Jo 

which just counts the number of possible image patches 
for which the observed log probability is greater than 
— E + c. If S(E) is increasing, then this integral is dom- 
inated by the behavior near its upper limit, so that 



M{E) = - 



E/N 



dee Ns ^ 



N 



ds(e) 
de 



^Ns(e=E/N) 



(11) 

(12) 



s(e) = -ln^(£ = iV e ) + ^. (13) 

Notice that the second term in this equation vanishes for 
large N, and so we approximate the entropy per pixel as 
a function of energy per pixel by the first term. 



The results of the previous paragraph imply that if 
we just count the number of possible image patches with 
probability greater than a certain level, then we can con- 
struct the entropy vs. energy and hence derive all other 
thermodynamic functions. This is what we do in Fig. 3a 
for our image ensemble, using rectangular LxL' patches 
of size from 8 pixels up to 50 pixels; in Fig 3b we do the 
same thing for our Monte Carlo simulations of the Ising 
model. As a practical matter it is important to note that 
sampling problems are less serious at low energies (states 
with high probability) , so we expect that even if we look 
at regions where we can't sample the whole distribution 
we will get the correct low energy behavior. 

We first note that the results on image patches of dif- 
ferent sizes are remarkably consistent with one another, 
suggesting that we are seeing signs of the thermody- 
namic limit despite the small size of the regions that 
we can explore fully. Next, since the real image ensem- 
ble is at T = 1, we want to pick out the energy at which 
dS/dE — 1; this seems very difficult, since the plot is 
very nearly a straight line with unit slope. But we know 
what this means: if the point where dS/dE = 1 is also 
a place where d 2 S/dE 2 = 0, then T = 1 is a critical 
point. Thus, the fact that S/N vs. E/N is very nearly 
a straight line of unit slope is direct evidence that the 
ensemble of natural images is at criticality. 

It is interesting that this approach to the thermody- 
namics of images is connected to Zipf's law [17]. We re- 
call that Zipf estimated the probability distribution from 
which words are drawn in an English text, and argued 
that if we put the words in rank order (r = 1 is the most 
common word) , then p r oc 1 jr up to some maximum 
rank r = N w corresponding to the number of different 
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FIG. 4: Locally stable states, (a) The 49 most probable patches such that all single pixel inversions decrease their probability 
in the ensemble. Even in small (4 x 4) patches these locally stable states are interpretable as lines and edges at all positions 
and orientations, (b) The average surrounding 10 x 10 light-intensity images leading to these metastable states. These images 
resemble the average images that trigger responses of neurons in primary visual cortex. 



words N w used in the text. Subsequently, other authors 
have considered generalized Zipf-like distributions |18j . 
p r oc l/r a , and there has been much discussion about 
the meaning of these relationships. Suppose we identify 
the Zipf-like distribution p r = A/r a with a Boltzmann 
distribution at T = 1, p r = (l/Z)e~ Er . Then the en- 
ergy of the state at rank r is E r = alnr — ln(AZ). In 
the limit of a large system with many possible states, 
we can approximate the density of states by realizing 
the variable r has a uniform distribution, and hence 
p(E) w \dE r /dr\~ x ; this gives p(E) — r/a. But we 
also have r = (AZ) 1 ^ a e Er ^ a , so we find 

p(E) = -(AZ) 1/a e E/a => Szi P t(E) = El a + constant. 
a 

(in 

Thus a generalized Zipf 's law is equivalent to an entropy 
that is (exactly!) linear in the energy. The original Zipf 's 
law (a = 1) corresponds to a unit slope, as we have found 
for image patches. Further, we have seen that this simple 
linear relation corresponds exactly to what we find for a 
thermodynamic system at a critical point [13] • 

Why does it matter that the ensemble of natural im- 
ages is at a critical point? The signature of criticality is 
the divergence of the specific heat, and the specific heat 
is the variance in the energy, which is the log probabil- 
ity. Thus, being at a critical point means that the log 
probability has an enormously broad distribution, with 
a formally divergent second moment even once we nor- 
malize by the number of pixels. One consequence is that 
the approach toward typicality in the sense of informa- 
tion theory |15j will be much slower than one would find 
away from the critical point, which may be related to 
difficulties in compressing large natural images, or even 
in estimating their entropy (see, for example, Ref [16 ). 



The large variance in log probability also means that 
there are large fluctuations in how surprised we should 
be by any given scene or segment of a scene, which per- 
haps quantifies our common experience. 



V. THE ENERGY LANDSCAPE 

Critical points mark the transition between phases 
characterized by different forms of order: liquid vs. gas, 
ferromagnet vs. paramagnet, and so on. What is the 
ordering that would emerge if somehow the distribution 
of natural images could be "cooled" from T = 1 down 
toward T = 0? This ultimately is a question about the 
nature of the image patches that correspond to the low 
energy states. Certainly the lowest energy states of small 
patches are solid black or white blocks, as in a ferromag- 
net where all the spins can align up or down, and these 
states will dominate at T — 0. But, searching through all 
4x4 patches, we find ~ 100 states that are local minima 
of the energy, in the sense that flipping any single pixel 
from black to white (or vice versa) results in increased 
energy or reduced probability. In Figure 4a we show 49 
of these states, ordered in decreasing probability. We 
see that many of these states are interpretable, for ex- 
ample as edges between dark and light regions, and that 
much of the multiplicity arises from the different ways 
of realizing these patterns (e.g., the six possible cases 
of a single vertical edge). We can think of these local 
minima in the energy landscape [20 as being like the at- 
tractors in the Hopfield model of neural networks [21 , or 
like the code words in statistical mechanics approaches 
to error-correcting codes [22 . Usually we think of error- 
correcting coding as a construct, but here it seems that 
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the signals which the world presents to us have some 
intrinsic error-correcting properties. 

Although there are many reasons why edges may be 
important for vision, it is interesting to take seriously 
the idea that such image features acquire their impor- 
tance because of their intrinsic properties of error cor- 
rection, as if these are the signals that the world is "try- 
ing" to send us in the most fault tolerant fashion. If 
this is the case, then the visual system might build fea- 
ture detecting neurons which serve to identify the basins 
of attraction defined by these local minima in energy. 
If such cells respond only when the original grey scale 
image corresponds to a discrete image within a partic- 
ular basin of attraction, then it is easy to compute the 
response-triggered average within our natural image en- 
semble, with the results shown in Fig 4b. These results 
have a strong resemblance to the spike-triggered average 
responses of neurons in visual cortex to natural scenes 
[25] , Interestingly, if we look more closely at the prob- 
lem of identifying the basin of attraction, we find that 
perceptron-like models based on filtering through a sin- 



gle receptive field do rather poorly. Thus, if visual cor- 
tex really builds a representation of the world based on 
the identification of these local minima in the energy 
landscape, the computations involved necessarily involve 
nonlinear combinations of multiple filters, as observed 
[24"] . Much remains to be done to see if this really is a 
path to a theory of these more complex neural responses. 
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